59 research outputs found

    Image Based View Synthesis

    Get PDF
    This dissertation deals with the image-based approach to synthesize a virtual scene using sparse images or a video sequence without the use of 3D models. In our scenario, a real dynamic or static scene is captured by a set of un-calibrated images from different viewpoints. After automatically recovering the geometric transformations between these images, a series of photo-realistic virtual views can be rendered and a virtual environment covered by these several static cameras can be synthesized. This image-based approach has applications in object recognition, object transfer, video synthesis and video compression. In this dissertation, I have contributed to several sub-problems related to image based view synthesis. Before image-based view synthesis can be performed, images need to be segmented into individual objects. Assuming that a scene can approximately be described by multiple planar regions, I have developed a robust and novel approach to automatically extract a set of affine or projective transformations induced by these regions, correctly detect the occlusion pixels over multiple consecutive frames, and accurately segment the scene into several motion layers. First, a number of seed regions using correspondences in two frames are determined, and the seed regions are expanded and outliers are rejected employing the graph cuts method integrated with level set representation. Next, these initial regions are merged into several initial layers according to the motion similarity. Third, the occlusion order constraints on multiple frames are explored, which guarantee that the occlusion area increases with the temporal order in a short period and effectively maintains segmentation consistency over multiple consecutive frames. Then the correct layer segmentation is obtained by using a graph cuts algorithm, and the occlusions between the overlapping layers are explicitly determined. Several experimental results are demonstrated to show that our approach is effective and robust. Recovering the geometrical transformations among images of a scene is a prerequisite step for image-based view synthesis. I have developed a wide baseline matching algorithm to identify the correspondences between two un-calibrated images, and to further determine the geometric relationship between images, such as epipolar geometry or projective transformation. In our approach, a set of salient features, edge-corners, are detected to provide robust and consistent matching primitives. Then, based on the Singular Value Decomposition (SVD) of an affine matrix, we effectively quantize the search space into two independent subspaces for rotation angle and scaling factor, and then we use a two-stage affine matching algorithm to obtain robust matches between these two frames. The experimental results on a number of wide baseline images strongly demonstrate that our matching method outperforms the state-of-art algorithms even under the significant camera motion, illumination variation, occlusion, and self-similarity. Given the wide baseline matches among images I have developed a novel method for Dynamic view morphing. Dynamic view morphing deals with the scenes containing moving objects in presence of camera motion. The objects can be rigid or non-rigid, each of them can move in any orientation or direction. The proposed method can generate a series of continuous and physically accurate intermediate views from only two reference images without any knowledge about 3D. The procedure consists of three steps: segmentation, morphing and post-warping. Given a boundary connection constraint, the source and target scenes are segmented into several layers for morphing. Based on the decomposition of affine transformation between corresponding points, we uniquely determine a physically correct path for post-warping by the least distortion method. I have successfully generalized the dynamic scene synthesis problem from the simple scene with only rotation to the dynamic scene containing non-rigid objects. My method can handle dynamic rigid or non-rigid objects, including complicated objects such as humans. Finally, I have also developed a novel algorithm for tri-view morphing. This is an efficient image-based method to navigate a scene based on only three wide-baseline un-calibrated images without the explicit use of a 3D model. After automatically recovering corresponding points between each pair of images using our wide baseline matching method, an accurate trifocal plane is extracted from the trifocal tensor implied in these three images. Next, employing a trinocular-stereo algorithm and barycentric blending technique, we generate an arbitrary novel view to navigate the scene in a 2D space. Furthermore, after self-calibration of the cameras, a 3D model can also be correctly augmented into this virtual environment synthesized by the tri-view morphing algorithm. We have applied our view morphing framework to several interesting applications: 4D video synthesis, automatic target recognition, multi-view morphing

    Regional Differential Information Entropy for Super-Resolution Image Quality Assessment

    Full text link
    PSNR and SSIM are the most widely used metrics in super-resolution problems, because they are easy to use and can evaluate the similarities between generated images and reference images. However, single image super-resolution is an ill-posed problem, there are multiple corresponding high-resolution images for the same low-resolution image. The similarities can't totally reflect the restoration effect. The perceptual quality of generated images is also important, but PSNR and SSIM do not reflect perceptual quality well. To solve the problem, we proposed a method called regional differential information entropy to measure both of the similarities and perceptual quality. To overcome the problem that traditional image information entropy can't reflect the structure information, we proposed to measure every region's information entropy with sliding window. Considering that the human visual system is more sensitive to the brightness difference at low brightness, we take γ\gamma quantization rather than linear quantization. To accelerate the method, we reorganized the calculation procedure of information entropy with a neural network. Through experiments on our IQA dataset and PIPAL, this paper proves that RDIE can better quantify perceptual quality of images especially GAN-based images.Comment: 8 pages, 9 figures, 4 table

    Automatic Target Recognition Using Multi-View Morphing

    No full text
    This paper describes a novel approach to automatically recognize the target based on a view morphing database constructed by our multi-view morphing algorithm. Instead of using single reference image, a set of images or a video sequence is used to construct the reference database, where these images are re-organized by a triangulation of viewing sphere. At the vertex of each triangle, one image is stored in the database as the reference view from a specific viewing direction. For each triangle, our tri-view morphing algorithm can synthesize a high quality image for an arbitrary novel viewpoint amongst three neighboring reference images, and the barycentric blending scheme guarantees the seamless transitions between each neighboring triangles. Using the synthesized images, we apply appearance based recognition technique to recognize the target. In addition, using the proposed method, the pose of the object or camera motion can be approximately estimated. Several examples are demonstrated in the experiments to show that our approach is effective and promising

    Fall Term

    No full text
    This dissertation deals with the image-based approach to synthesize a virtual scene using sparse images or a video sequence without the use of 3D models. In our scenario, a real dy-namic or static scene is captured by a set of un-calibrated images from different viewpoints. After automatically recovering the geometric transformations between these images, a series of photo-realistic virtual views can be rendered and a virtual environment covered by these several static cameras can be synthesized. This image-based approach has applications in ob-ject recognition, object transfer, video synthesis and video compression. In this dissertation, I have contributed to several sub-problems related to image based view synthesis. Before image-based view synthesis can be performed, images need to be segmented into individual objects. Assuming that a scene can approximately be described by multiple planar regions, I have developed a robust and novel approach to automatically extract a set of affine or projective transformations induced by these regions, correctly detect the occlusion pixels over multiple consecutive frames, and accurately segment the scene into several motion layers. First, a number of seed regions using correspondences in two frames are determined, an

    Accurate Motion Layer Segmentation And Matting

    No full text
    Given a video sequence, obtaining accurate layer segmentation and alpha matting is very important for various applications. However, when a non- textured or smooth area is present in the scene, the segmentation based on only single motion cue usually cannot provide satisfactory results. Conversely, the most matting approaches require a smooth assumption on foreground and background to obtain a good result. In this paper, we combine the merits of motion segmentation and alpha matting technique together to simultaneously achieve high-quality layer segmentation and alpha mattes. First, we explore a general occlusion constraint and design a novel graph cuts framework to solve the layer-based motion segmentation problem for the textured regions using multiple frames. Then, an alpha matting technique is further used to refine the segmentation and resolve the non-textured ambiguities by determining proper alpha values for the foreground and background respectively

    Motion Layer Extraction In The Presence Of Occlusion Using Graph Cut

    No full text
    Extracting layers from video is very important for video representation, analysis, compression, and recognition. Assuming that a scene can be approximately described by multiple planar regions, this paper describes a robust novel approach to automatically extract a set of affine transformations induced by these regions, and accurately segment the scene into several motion layers. First, a number of seed regions are determined by using two frame correspondences. Then the seed regions are expanded and refined using the level set representation and employing graph cut method. Next, these initial regions are merged into several initial layers according to the motion similarity. Third, after exploiting the occlusion order constraint on multiple frames the robust layer extraction is obtained by graph cut algorithm, and the occlusions between the overlapping layers are explicitly determined. Several examples are demonstrated in the experiments to show that our approach is effective and robust

    Accurate motion layer segmentation and matting

    No full text
    Given a video sequence, obtaining accurate layer segmentation and alpha matting is very important for various applications. However, when a non-textured or smooth area is present in the scene, the segmentation based on only single motion cue usually cannot provide satisfactory results. Conversely, the most matting approaches require a smooth assumption on foreground and background to obtain a good result. In this paper, we combine the merits of motion segmentation and alpha matting technique together to simultaneously achieve high-quality layer segmentation and alpha mattes. First, we explore a general occlusion constraint and design a novel graph cuts framework to solve the layerbased motion segmentation problem for the textured regions using multiple frames. Then, an alpha matting technique is further used to refine the segmentation and resolve the nontextured ambiguities by determining proper alpha values for the foreground and background respectively.
    • …
    corecore